Introduction

In this RMarkdown document I will be doing the analysis of the “Full Rabbit Dataset.csv” constituting the main focus of the project. This data set contains point occurrence counts of invasive European rabbits (Oryctolagus cuniculus) in Australia. These point counts were collated from various direct and indirect studies as well as citizen science observations during the time frame 1760 to 2015 and form part of the long term rabbit data set (Roy-Dufresne et al 2019). The data set also contained a suit of environmental variables as well as the presence-absence data from the “Species Pseudoabsence Generation.Rmd” document. Finally, various abundance estimates were derived from the data provided by the data base and from the “Estimating Rabbit Abundance.Rmd” document.

Variable names:

The 13 vegetation types were a re-classification of all the vegetation types in Australia as described by the Environment Department of the Australian Government classification scheme and the original classes can be found at “https://www.awe.gov.au/agriculture-land/land/native-vegetation/national-vegetation-information-system/data-products”. The re-classifications used in the data set are found below:

The factor levels in Disease are coded 0-3 with what each number corresponds to below:

As Season is coded according to the Australian calender the numeric coding of 1-4 represents different seasons than would be the case for a Northern hemisphere nation:

The state variable refers to the 8 states/territories of Australia coded with their 2/3-letter coding with the full names of each state/territory given below:

Finally, in all the animal presence/absence factors 1 represents presences and 0 represents absences.

The aim of the project is to determine the drivers of rabbit occurrence patterns, given the variables in the data set, at different spatial scales and compare the variables that are in each final model. The scales will be on the country scale, state/territory scale and transect scale. The transect scale will consist of random transects sampled from the data on a North-South axis and a East-West axis.

R Environment Set Up and Importing the Data

I will start by loading the R packages that I will be using for this analysis.

library(mgcv)
## Loading required package: nlme
## This is mgcv 1.8-36. For overview type 'help("mgcv-package")'.
library(ggplot2)
library(ggcleveland)
## Warning: package 'ggcleveland' was built under R version 4.1.2
library(patchwork)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## v purrr   0.3.4
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::collapse() masks nlme::collapse()
## x dplyr::filter()   masks stats::filter()
## x dplyr::lag()      masks stats::lag()
library(effects)
## Warning: package 'effects' was built under R version 4.1.2
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.1.2
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(car)
## Warning: package 'car' was built under R version 4.1.2
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some

There are also some custom functions that I want to create that will be helpful for graphical data analysis and plotting with ggplot2

#Augmented pairs plot
panel.hist = function(x, ...) {
  usr = par("usr"); on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 2.5))
  hist(x, freq = FALSE, col="cyan", add=TRUE) 
  lines(density(x))
}
panel.cor = function(x, y, digits = 2, prefix = "", cex.cor, ...){
  usr = par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r = abs(cor(x, y))
  txt = format(c(r, 0.123456789), digits = digits)[1]
  txt = paste0(prefix, txt)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs2 = function (x) {
  pairs(x, lower.panel = panel.smooth, upper.panel = panel.cor, diag.panel = panel.hist)
}

#Co-plot panel function
coplot.ablines = function(x, y, ...){
  tmp = lm(y ~ x, na.action = na.omit)
  abline(tmp)
  points(x, y)
}

#Custom ggplot theme
theme_customized = function(base_size = 13, base_family = "", base_line_size = base_size/22, base_rect_size = base_size/22){
  theme(
    axis.title = element_text(size = 13),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    plot.caption = element_text(size = 10, face = "italic"),
    panel.background = element_rect(fill = "white"),
    axis.line = element_line(size = 1, colour = "black"),
    strip.background = element_rect(fill = "#cddcdd"),
    panel.border = element_rect(colour = "black", fill = NA, size = 0.5),
    strip.text = element_text(colour = "black"),
    legend.key = element_blank()
  )
}

I would also like to note that the custom R functions above were provided to me during statistics courses run by Dr Alex Douglas and Dr Thomas Cornulier at the University of Aberdeen.

Now lets finally import the data

Rabbit = read.table("E:/Masters Project/BI5002 (Masters Project)/Invasive European Rabbit Data/Full Rabbit Dataset.csv", 
                    header = TRUE, stringsAsFactors = TRUE, sep = ",")
str(Rabbit)
## 'data.frame':    689265 obs. of  37 variables:
##  $ Occurrence_ID       : int  683808 683809 684986 684987 684988 686921 686922 686923 686924 686925 ...
##  $ Lat                 : num  -37.1 -38.4 -36.8 -36.5 -36.6 ...
##  $ Long                : num  148 145 147 145 147 ...
##  $ Occurences          : int  33 24 5 2 7 159 7 57 57 111 ...
##  $ Abund.1             : int  33 24 5 2 7 159 7 57 57 111 ...
##  $ Abund.2             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Abund.3             : num  5.14e-04 3.74e-04 7.78e-05 3.11e-05 1.09e-04 ...
##  $ No.of.10km.cells    : int  6425 6425 6425 6425 6425 6425 6425 6425 6425 6425 ...
##  $ Year                : int  1760 1760 1760 1760 1760 1760 1760 1760 1760 1760 ...
##  $ Day                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Psea_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TAvg_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TMax_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TMin_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TSea_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TWet_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TWrm_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgAutumn30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgSummer30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgSpring30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgWinter30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DistPermWater       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DistAgriLand        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PercSoilClay        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MinDayLength        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ VarDayLength        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ State               : Factor w/ 8 levels "ACT","NSW","NT",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ VegeType            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ Season              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Month               : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Diseases            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Red.Fox             : int  1 1 1 1 1 1 0 0 0 0 ...
##  $ Dingo               : int  1 1 1 1 1 1 0 0 0 0 ...
##  $ Feral.Cat           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Whistling.Kite      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Wallaby.Sp          : int  1 1 1 1 1 1 1 1 1 1 ...
head(Rabbit, n = 10)
tail(Rabbit, n = 10)

The factors have not been coded as factors, as such I will need to factorise them but everything else looks okay with the data frame.

#Factorise the factor
Rabbit$VegeType = factor(Rabbit$VegeType)
Rabbit$Season = factor(Rabbit$Season)
Rabbit$Month = factor(Rabbit$Month)
Rabbit$Diseases = factor(Rabbit$Diseases)
Rabbit$Red.Fox = factor(Rabbit$Red.Fox)
Rabbit$Dingo = factor(Rabbit$Dingo)
Rabbit$Feral.Cat = factor(Rabbit$Feral.Cat)
Rabbit$Whistling.Kite = factor(Rabbit$Whistling.Kite)
Rabbit$Wallaby.Sp = factor(Rabbit$Wallaby.Sp)

#Re-check the data set
str(Rabbit)
## 'data.frame':    689265 obs. of  37 variables:
##  $ Occurrence_ID       : int  683808 683809 684986 684987 684988 686921 686922 686923 686924 686925 ...
##  $ Lat                 : num  -37.1 -38.4 -36.8 -36.5 -36.6 ...
##  $ Long                : num  148 145 147 145 147 ...
##  $ Occurences          : int  33 24 5 2 7 159 7 57 57 111 ...
##  $ Abund.1             : int  33 24 5 2 7 159 7 57 57 111 ...
##  $ Abund.2             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Abund.3             : num  5.14e-04 3.74e-04 7.78e-05 3.11e-05 1.09e-04 ...
##  $ No.of.10km.cells    : int  6425 6425 6425 6425 6425 6425 6425 6425 6425 6425 ...
##  $ Year                : int  1760 1760 1760 1760 1760 1760 1760 1760 1760 1760 ...
##  $ Day                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Psea_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TAvg_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TMax_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TMin_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TSea_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TWet_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TWrm_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgAutumn30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgSummer30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgSpring30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgWinter30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DistPermWater       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DistAgriLand        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PercSoilClay        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MinDayLength        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ VarDayLength        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ State               : Factor w/ 8 levels "ACT","NSW","NT",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ VegeType            : Factor w/ 13 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ Season              : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Month               : Factor w/ 20 levels "0","1","2","3",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Diseases            : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Red.Fox             : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 1 ...
##  $ Dingo               : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 1 ...
##  $ Feral.Cat           : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Whistling.Kite      : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Wallaby.Sp          : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
summary(Rabbit)
##  Occurrence_ID         Lat              Long         Occurences    
##  Min.   :     1   Min.   :-43.49   Min.   :113.0   Min.   :     1  
##  1st Qu.:172317   1st Qu.:-34.15   1st Qu.:139.1   1st Qu.:  5779  
##  Median :344633   Median :-34.15   Median :139.2   Median : 45089  
##  Mean   :344634   Mean   :-34.02   Mean   :140.1   Mean   : 44618  
##  3rd Qu.:516949   3rd Qu.:-33.25   3rd Qu.:139.4   3rd Qu.: 69866  
##  Max.   :689285   Max.   :-12.35   Max.   :153.7   Max.   :104045  
##                                                                    
##     Abund.1          Abund.2          Abund.3       No.of.10km.cells
##  Min.   :     1   Min.   :  0.0    Min.   :   0.0   Min.   :   1.0  
##  1st Qu.:  8072   1st Qu.:  0.0    1st Qu.: 216.7   1st Qu.:   2.0  
##  Median : 45089   Median :  1.0    Median :2254.4   Median :   2.0  
##  Mean   : 45553   Mean   :  7.0    Mean   :2202.2   Mean   : 444.2  
##  3rd Qu.: 69866   3rd Qu.:  6.0    3rd Qu.:3493.3   3rd Qu.:   2.0  
##  Max.   :104045   Max.   :546.7    Max.   :5202.2   Max.   :6425.0  
##  NA's   :16774    NA's   :638089                                    
##       Year            Day        A_Prec_Avg30Yr   A_Psea_Avg30Yr  
##  Min.   :1760    Min.   : 1.00   Min.   : 135.5   Min.   :  9.73  
##  1st Qu.:2007    1st Qu.: 9.00   1st Qu.: 262.6   1st Qu.: 23.40  
##  Median :2009    Median :13.00   Median : 336.8   Median : 27.34  
##  Mean   :2007    Mean   :16.06   Mean   : 379.9   Mean   : 29.87  
##  3rd Qu.:2009    3rd Qu.:24.00   3rd Qu.: 423.0   3rd Qu.: 36.36  
##  Max.   :2015    Max.   :31.00   Max.   :3270.0   Max.   :137.75  
##  NA's   :16774   NA's   :70248   NA's   :39295    NA's   :39295   
##  A_TAvg_Avg30Yr  A_TMax_Avg30Yr  A_TMin_Avg30Yr  A_TSea_Avg30Yr 
##  Min.   : 4.83   Min.   :15.15   Min.   :-4.95   Min.   :158.7  
##  1st Qu.:14.78   1st Qu.:29.55   1st Qu.: 3.04   1st Qu.:472.2  
##  Median :15.66   Median :30.38   Median : 4.13   Median :478.1  
##  Mean   :15.52   Mean   :30.28   Mean   : 3.83   Mean   :491.1  
##  3rd Qu.:16.14   3rd Qu.:31.36   3rd Qu.: 4.61   3rd Qu.:527.7  
##  Max.   :28.21   Max.   :41.84   Max.   :18.46   Max.   :676.2  
##  NA's   :39295   NA's   :39295   NA's   :39295   NA's   :39295  
##  A_TWet_Avg30Yr  A_TWrm_Avg30Yr  A_Prec_AvgAutumn30Yr A_Prec_AvgSummer30Yr
##  Min.   : 1.06   Min.   : 9.80   Min.   :  8.11       Min.   :  5.31      
##  1st Qu.: 9.82   1st Qu.:20.88   1st Qu.: 18.71       1st Qu.: 22.25      
##  Median :12.01   Median :21.69   Median : 22.84       Median : 23.52      
##  Mean   :12.24   Mean   :21.66   Mean   : 26.84       Mean   : 27.84      
##  3rd Qu.:13.72   3rd Qu.:22.36   3rd Qu.: 29.75       3rd Qu.: 24.23      
##  Max.   :32.50   Max.   :33.11   Max.   :447.45       Max.   :474.84      
##  NA's   :39295   NA's   :39295   NA's   :39295        NA's   :39295       
##  A_Prec_AvgSpring30Yr A_Prec_AvgWinter30Yr DistPermWater      DistAgriLand   
##  Min.   :  0.85       Min.   :  1.23       Min.   :  0.000   Min.   :  0.00  
##  1st Qu.: 27.68       1st Qu.: 24.45       1st Qu.:  1.780   1st Qu.:  1.57  
##  Median : 32.78       Median : 35.97       Median :  3.360   Median :  6.76  
##  Mean   : 37.35       Mean   : 38.33       Mean   :  3.897   Mean   : 21.99  
##  3rd Qu.: 42.73       3rd Qu.: 48.52       3rd Qu.:  4.010   3rd Qu.: 20.24  
##  Max.   :251.57       Max.   :286.80       Max.   :170.370   Max.   :915.02  
##  NA's   :39295        NA's   :39295        NA's   :1073      NA's   :1073    
##   PercSoilClay    MinDayLength     VarDayLength       State       
##  Min.   : 5.00   Min.   : 8.950   Min.   :0.250   QLD    :543602  
##  1st Qu.:20.90   1st Qu.: 9.870   1st Qu.:2.310   VIC    :105780  
##  Median :28.10   Median : 9.870   Median :2.480   WA     : 18385  
##  Mean   :25.29   Mean   : 9.878   Mean   :2.478   ACT    :  8644  
##  3rd Qu.:29.99   3rd Qu.: 9.950   3rd Qu.:2.480   SA     :  6394  
##  Max.   :57.55   Max.   :11.400   Max.   :4.940   NT     :  3498  
##  NA's   :1078    NA's   :1047     NA's   :1047    (Other):  2962  
##     VegeType       Season           Month        Diseases      Red.Fox      
##  11     :413419   1   :142960   4      :155686   0   :   278   0   :   121  
##  10     :113955   2   :289616   3      :113069   1   : 26021   1   : 66993  
##  3      :113598   3   :160038   7      :110078   2   :643287   NA's:622151  
##  4      : 13507   4   : 69582   1      : 92315   3   :  2905                
##  2      : 11864   NA's: 27069   2      : 39119   NA's: 16774                
##  (Other): 21795                 (Other):153342                              
##  NA's   :  1127                 NA's   : 25656                              
##   Dingo        Feral.Cat     Whistling.Kite Wallaby.Sp   
##  0   :   121   0   :   121   0   :   121    0   :   121  
##  1   : 13871   1   : 13816   1   :144911    1   : 73930  
##  NA's:675273   NA's:675328   NA's:544233    NA's:615214  
##                                                          
##                                                          
##                                                          
## 

In order to look at the drivers of rabbit occurrences at different scales I need to sub set the data to make new data frames based on these scales. First I will create a subset data frame for each Australian state/territory. I will then name each data frame “Rabbit_state/territory”

Rabbit_ACT = Rabbit[Rabbit$State == "ACT", ]
Rabbit_NSW = Rabbit[Rabbit$State == "NSW", ]
Rabbit_NT = Rabbit[Rabbit$State == "NT", ]
Rabbit_QLD = Rabbit[Rabbit$State == "QLD", ]
Rabbit_SA = Rabbit[Rabbit$State == "SA", ]
Rabbit_TAS = Rabbit[Rabbit$State == "TAS", ]
Rabbit_VIC = Rabbit[Rabbit$State == "VIC", ]
Rabbit_WA = Rabbit[Rabbit$State == "WA", ]

A point of note to myself, the Tasmania data set only has 365 observations whilst the other 7 data sets have thousands of observations. This may limit the number of parameters that can be potentially fitted to statistical models for the Tasmania given the number of potential predictors, whereas there should not be any such limitations for the other state/ territory specific data sets.

Creating the transect-scale data sets will be more tricky. To create the North-South transects I need to randomly sample longitudes given a fixed latitude and to create the East-West transects I need to randomly sample latitudes given a fixed longitude. To create transects of equal sizes the fixed longitudes and fixed latitudes can be within a range that corresponds to a set physical distance measured in metres.

First, some starting longitudes and latitudes to make the transects from.

#Most Northern and Southern Points
Long_max = max(Rabbit$Long, na.rm = TRUE)
Long_min = min(Rabbit$Long, na.rm = TRUE)
Long_max
## [1] 153.65
Long_min
## [1] 113.05
#Most Eastern and Western Points
Lat_min = min(Rabbit$Lat, na.rm = TRUE)
Lat_max = max(Rabbit$Lat, na.rm = TRUE)
Lat_min
## [1] -43.49
Lat_max
## [1] -12.35

As there is only a difference of approximately 40 units between Long_max and Long_min I will make 8 North-South transects with a difference in longitude of approximately 5 units. As for latitude there is only a difference of approximately 32 units between Lat_max and Lat_min, as such I will make 6 East-West transects with a difference in latitude of approximately 5 units.

Now we can make the transect-level data sets.

#North-South Transects
NS1 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65, ]), 1000, replace = TRUE), ]
NS2 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 5.00, ]), 1000, replace = TRUE), ]
NS3 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 10.00, ]), 1000, replace = TRUE), ]
NS4 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 15.00, ]), 1000, replace = TRUE), ]
NS5 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 20.00, ]), 1000, replace = TRUE), ]
NS6 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 25.00, ]), 1000, replace = TRUE), ]
NS7 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 30.00, ]), 1000, replace = TRUE), ]
NS8 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 113.05, ]), 1000, replace = TRUE), ]

NS = rbind(NS1, NS2, NS3, NS4, NS5, NS6, NS7, NS8)
NS = as.data.frame(NS)
rm(NS1, NS2, NS3, NS4, NS5, NS6, NS7, NS8)

#East-West Transects
EW1 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW2 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW3 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW4 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW5 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW6 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]

EW = rbind(EW1, EW2, EW3, EW4, EW5, EW6)
EW = as.data.frame(EW)
rm(EW1, EW2, EW3, EW4, EW5, EW6)

#Add Transect Variables
x = factor(rep(letters[1:8], each = 1000))
y = factor(rep(letters[1:6], each = 1000))
NS$Transect = x
EW$Transect = y
str(NS)
## 'data.frame':    8000 obs. of  38 variables:
##  $ Occurrence_ID       : int  684986 683808 684986 683809 683809 683808 684986 684986 684987 683809 ...
##  $ Lat                 : num  -36.8 -37.1 -36.8 -38.4 -38.4 ...
##  $ Long                : num  147 148 147 145 145 ...
##  $ Occurences          : int  5 33 5 24 24 33 5 5 2 24 ...
##  $ Abund.1             : int  5 33 5 24 24 33 5 5 2 24 ...
##  $ Abund.2             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Abund.3             : num  7.78e-05 5.14e-04 7.78e-05 3.74e-04 3.74e-04 ...
##  $ No.of.10km.cells    : int  6425 6425 6425 6425 6425 6425 6425 6425 6425 6425 ...
##  $ Year                : int  1760 1760 1760 1760 1760 1760 1760 1760 1760 1760 ...
##  $ Day                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Psea_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TAvg_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TMax_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TMin_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TSea_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TWet_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TWrm_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgAutumn30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgSummer30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgSpring30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgWinter30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DistPermWater       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DistAgriLand        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PercSoilClay        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MinDayLength        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ VarDayLength        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ State               : Factor w/ 8 levels "ACT","NSW","NT",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ VegeType            : Factor w/ 13 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ Season              : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Month               : Factor w/ 20 levels "0","1","2","3",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Diseases            : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Red.Fox             : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Dingo               : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Feral.Cat           : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Whistling.Kite      : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Wallaby.Sp          : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Transect            : Factor w/ 8 levels "a","b","c","d",..: 1 1 1 1 1 1 1 1 1 1 ...
str(EW)
## 'data.frame':    6000 obs. of  38 variables:
##  $ Occurrence_ID       : int  683808 683808 683809 683809 683809 683808 683809 683809 683809 683809 ...
##  $ Lat                 : num  -37.1 -37.1 -38.4 -38.4 -38.4 ...
##  $ Long                : num  148 148 145 145 145 ...
##  $ Occurences          : int  33 33 24 24 24 33 24 24 24 24 ...
##  $ Abund.1             : int  33 33 24 24 24 33 24 24 24 24 ...
##  $ Abund.2             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Abund.3             : num  0.000514 0.000514 0.000374 0.000374 0.000374 ...
##  $ No.of.10km.cells    : int  6425 6425 6425 6425 6425 6425 6425 6425 6425 6425 ...
##  $ Year                : int  1760 1760 1760 1760 1760 1760 1760 1760 1760 1760 ...
##  $ Day                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Psea_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TAvg_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TMax_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TMin_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TSea_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TWet_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_TWrm_Avg30Yr      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgAutumn30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgSummer30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgSpring30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ A_Prec_AvgWinter30Yr: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DistPermWater       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DistAgriLand        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PercSoilClay        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MinDayLength        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ VarDayLength        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ State               : Factor w/ 8 levels "ACT","NSW","NT",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ VegeType            : Factor w/ 13 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ Season              : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Month               : Factor w/ 20 levels "0","1","2","3",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Diseases            : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Red.Fox             : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Dingo               : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Feral.Cat           : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Whistling.Kite      : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Wallaby.Sp          : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Transect            : Factor w/ 6 levels "a","b","c","d",..: 1 1 1 1 1 1 1 1 1 1 ...
#Check the Data
summary(NS)
##  Occurrence_ID         Lat              Long         Occurences    
##  Min.   :   127   Min.   :-43.25   Min.   :113.0   Min.   :   1.0  
##  1st Qu.:682545   1st Qu.:-37.95   1st Qu.:144.2   1st Qu.:   5.0  
##  Median :683809   Median :-37.15   Median :145.3   Median :  15.0  
##  Mean   :666408   Mean   :-36.19   Mean   :143.9   Mean   : 324.9  
##  3rd Qu.:686922   3rd Qu.:-35.55   3rd Qu.:146.7   3rd Qu.:  33.0  
##  Max.   :689285   Max.   :-12.45   Max.   :153.1   Max.   :5121.0  
##                                                                    
##     Abund.1          Abund.2        Abund.3          No.of.10km.cells
##  Min.   :   1.0   Min.   : NA    Min.   :  0.00002   Min.   :   1    
##  1st Qu.:   5.0   1st Qu.: NA    1st Qu.:  0.00011   1st Qu.:6425    
##  Median :  15.0   Median : NA    Median :  0.00037   Median :6425    
##  Mean   : 324.9   Mean   :NaN    Mean   : 14.73780   Mean   :4973    
##  3rd Qu.:  33.0   3rd Qu.: NA    3rd Qu.:  0.00173   3rd Qu.:6425    
##  Max.   :5121.0   Max.   : NA    Max.   :256.05000   Max.   :6425    
##                   NA's   :8000                                       
##       Year           Day        A_Prec_Avg30Yr   A_Psea_Avg30Yr 
##  Min.   :1760   Min.   : 1.00   Min.   : 148.7   Min.   :15.12  
##  1st Qu.:1760   1st Qu.: 9.00   1st Qu.: 299.8   1st Qu.:21.19  
##  Median :1760   Median :15.00   Median : 299.8   Median :21.19  
##  Mean   :1830   Mean   :14.94   Mean   : 401.4   Mean   :25.70  
##  3rd Qu.:1900   3rd Qu.:23.00   3rd Qu.: 309.1   3rd Qu.:21.19  
##  Max.   :1971   Max.   :31.00   Max.   :1438.0   Max.   :62.49  
##                 NA's   :7507    NA's   :7701     NA's   :7701   
##  A_TAvg_Avg30Yr  A_TMax_Avg30Yr  A_TMin_Avg30Yr   A_TSea_Avg30Yr 
##  Min.   : 8.56   Min.   :21.01   Min.   :-1.140   Min.   :282.3  
##  1st Qu.:15.86   1st Qu.:31.24   1st Qu.: 2.780   1st Qu.:554.6  
##  Median :15.86   Median :31.24   Median : 2.780   Median :563.7  
##  Mean   :15.59   Mean   :30.12   Mean   : 3.416   Mean   :516.9  
##  3rd Qu.:15.86   3rd Qu.:31.24   3rd Qu.: 3.480   3rd Qu.:563.7  
##  Max.   :21.97   Max.   :38.37   Max.   : 9.930   Max.   :655.6  
##  NA's   :7701    NA's   :7701    NA's   :7701     NA's   :7701   
##  A_TWet_Avg30Yr  A_TWrm_Avg30Yr  A_Prec_AvgAutumn30Yr A_Prec_AvgSummer30Yr
##  Min.   : 3.52   Min.   :14.34   Min.   :  9.57       Min.   :13.64       
##  1st Qu.:12.47   1st Qu.:22.87   1st Qu.: 19.88       1st Qu.:25.99       
##  Median :12.47   Median :22.87   Median : 19.88       Median :25.99       
##  Mean   :12.36   Mean   :22.01   Mean   : 28.07       Mean   :30.13       
##  3rd Qu.:12.47   3rd Qu.:22.87   3rd Qu.: 21.90       3rd Qu.:25.99       
##  Max.   :29.87   Max.   :29.87   Max.   :105.40       Max.   :88.57       
##  NA's   :7701    NA's   :7701    NA's   :7701         NA's   :7701        
##  A_Prec_AvgSpring30Yr A_Prec_AvgWinter30Yr DistPermWater     DistAgriLand   
##  Min.   : 11.69       Min.   :  8.97       Min.   : 0.410   Min.   :  0.69  
##  1st Qu.: 28.46       1st Qu.: 26.81       1st Qu.: 5.350   1st Qu.: 11.66  
##  Median : 28.46       Median : 26.81       Median : 5.350   Median : 35.73  
##  Mean   : 37.11       Mean   : 39.67       Mean   : 5.222   Mean   : 28.53  
##  3rd Qu.: 28.70       3rd Qu.: 29.55       3rd Qu.: 5.350   3rd Qu.: 35.73  
##  Max.   :141.27       Max.   :150.06       Max.   :60.240   Max.   :363.88  
##  NA's   :7701         NA's   :7701         NA's   :7701     NA's   :7701    
##   PercSoilClay    MinDayLength     VarDayLength       State         VegeType   
##  Min.   : 5.52   Min.   : 9.470   Min.   :1.350   ACT    :8000   4      : 217  
##  1st Qu.:27.58   1st Qu.:10.030   1st Qu.:2.140   NSW    :   0   11     :  41  
##  Median :27.60   Median :10.030   Median :2.140   NT     :   0   2      :  14  
##  Mean   :24.54   Mean   : 9.941   Mean   :2.344   QLD    :   0   7      :  13  
##  3rd Qu.:27.60   3rd Qu.:10.030   3rd Qu.:2.140   SA     :   0   3      :  12  
##  Max.   :40.25   Max.   :10.460   Max.   :3.450   TAS    :   0   (Other):   2  
##  NA's   :7701    NA's   :7701     NA's   :7701    (Other):   0   NA's   :7701  
##   Season         Month      Diseases Red.Fox  Dingo    Feral.Cat Whistling.Kite
##  1   :5465   1      :5099   0:7176   0:1868   0:1708   0:1327    0: 801        
##  2   : 277   12     : 268   1: 824   1:6132   1:6292   1:6673    1:7199        
##  3   : 405   9      : 215   2:   0                                             
##  4   : 457   7      : 190   3:   0                                             
##  NA's:1396   11     : 186                                                      
##              (Other): 714                                                      
##              NA's   :1328                                                      
##  Wallaby.Sp    Transect   
##  0: 890     a      :1000  
##  1:7110     b      :1000  
##             c      :1000  
##             d      :1000  
##             e      :1000  
##             f      :1000  
##             (Other):2000
summary(EW)
##  Occurrence_ID         Lat              Long         Occurences  
##  Min.   :683808   Min.   :-38.35   Min.   :145.3   Min.   :24.0  
##  1st Qu.:683808   1st Qu.:-38.35   1st Qu.:145.3   1st Qu.:24.0  
##  Median :683808   Median :-37.15   Median :148.2   Median :33.0  
##  Mean   :683809   Mean   :-37.75   Mean   :146.8   Mean   :28.5  
##  3rd Qu.:683809   3rd Qu.:-37.15   3rd Qu.:148.2   3rd Qu.:33.0  
##  Max.   :683809   Max.   :-37.15   Max.   :148.2   Max.   :33.0  
##                                                                  
##     Abund.1        Abund.2        Abund.3          No.of.10km.cells
##  Min.   :24.0   Min.   : NA    Min.   :0.0003735   Min.   :6425    
##  1st Qu.:24.0   1st Qu.: NA    1st Qu.:0.0003735   1st Qu.:6425    
##  Median :33.0   Median : NA    Median :0.0005136   Median :6425    
##  Mean   :28.5   Mean   :NaN    Mean   :0.0004437   Mean   :6425    
##  3rd Qu.:33.0   3rd Qu.: NA    3rd Qu.:0.0005136   3rd Qu.:6425    
##  Max.   :33.0   Max.   : NA    Max.   :0.0005136   Max.   :6425    
##                 NA's   :6000                                       
##       Year           Day       A_Prec_Avg30Yr A_Psea_Avg30Yr A_TAvg_Avg30Yr
##  Min.   :1760   Min.   : NA    Min.   : NA    Min.   : NA    Min.   : NA   
##  1st Qu.:1760   1st Qu.: NA    1st Qu.: NA    1st Qu.: NA    1st Qu.: NA   
##  Median :1760   Median : NA    Median : NA    Median : NA    Median : NA   
##  Mean   :1760   Mean   :NaN    Mean   :NaN    Mean   :NaN    Mean   :NaN   
##  3rd Qu.:1760   3rd Qu.: NA    3rd Qu.: NA    3rd Qu.: NA    3rd Qu.: NA   
##  Max.   :1760   Max.   : NA    Max.   : NA    Max.   : NA    Max.   : NA   
##                 NA's   :6000   NA's   :6000   NA's   :6000   NA's   :6000  
##  A_TMax_Avg30Yr A_TMin_Avg30Yr A_TSea_Avg30Yr A_TWet_Avg30Yr A_TWrm_Avg30Yr
##  Min.   : NA    Min.   : NA    Min.   : NA    Min.   : NA    Min.   : NA   
##  1st Qu.: NA    1st Qu.: NA    1st Qu.: NA    1st Qu.: NA    1st Qu.: NA   
##  Median : NA    Median : NA    Median : NA    Median : NA    Median : NA   
##  Mean   :NaN    Mean   :NaN    Mean   :NaN    Mean   :NaN    Mean   :NaN   
##  3rd Qu.: NA    3rd Qu.: NA    3rd Qu.: NA    3rd Qu.: NA    3rd Qu.: NA   
##  Max.   : NA    Max.   : NA    Max.   : NA    Max.   : NA    Max.   : NA   
##  NA's   :6000   NA's   :6000   NA's   :6000   NA's   :6000   NA's   :6000  
##  A_Prec_AvgAutumn30Yr A_Prec_AvgSummer30Yr A_Prec_AvgSpring30Yr
##  Min.   : NA          Min.   : NA          Min.   : NA         
##  1st Qu.: NA          1st Qu.: NA          1st Qu.: NA         
##  Median : NA          Median : NA          Median : NA         
##  Mean   :NaN          Mean   :NaN          Mean   :NaN         
##  3rd Qu.: NA          3rd Qu.: NA          3rd Qu.: NA         
##  Max.   : NA          Max.   : NA          Max.   : NA         
##  NA's   :6000         NA's   :6000         NA's   :6000        
##  A_Prec_AvgWinter30Yr DistPermWater   DistAgriLand   PercSoilClay 
##  Min.   : NA          Min.   : NA    Min.   : NA    Min.   : NA   
##  1st Qu.: NA          1st Qu.: NA    1st Qu.: NA    1st Qu.: NA   
##  Median : NA          Median : NA    Median : NA    Median : NA   
##  Mean   :NaN          Mean   :NaN    Mean   :NaN    Mean   :NaN   
##  3rd Qu.: NA          3rd Qu.: NA    3rd Qu.: NA    3rd Qu.: NA   
##  Max.   : NA          Max.   : NA    Max.   : NA    Max.   : NA   
##  NA's   :6000         NA's   :6000   NA's   :6000   NA's   :6000  
##   MinDayLength   VarDayLength      State         VegeType    Season  
##  Min.   : NA    Min.   : NA    ACT    :6000   1      :   0   1:6000  
##  1st Qu.: NA    1st Qu.: NA    NSW    :   0   2      :   0   2:   0  
##  Median : NA    Median : NA    NT     :   0   3      :   0   3:   0  
##  Mean   :NaN    Mean   :NaN    QLD    :   0   4      :   0   4:   0  
##  3rd Qu.: NA    3rd Qu.: NA    SA     :   0   5      :   0           
##  Max.   : NA    Max.   : NA    TAS    :   0   (Other):   0           
##  NA's   :6000   NA's   :6000   (Other):   0   NA's   :6000           
##      Month      Diseases Red.Fox  Dingo    Feral.Cat Whistling.Kite Wallaby.Sp
##  1      :6000   0:6000   0:   0   0:   0   0:   0    0:   0         0:   0    
##  0      :   0   1:   0   1:6000   1:6000   1:6000    1:6000         1:6000    
##  2      :   0   2:   0                                                        
##  3      :   0   3:   0                                                        
##  4      :   0                                                                 
##  5      :   0                                                                 
##  (Other):   0                                                                 
##  Transect
##  a:1000  
##  b:1000  
##  c:1000  
##  d:1000  
##  e:1000  
##  f:1000  
## 

This concludes the “R Environment Set Up and Importing the Data section”. Next, we move on to “Initial Graphical Data Exploration and Research Questions”.

Initial Graphical Data Exploration and Research Questions

In this section I will be making scatter plots, box plots and co-plots of the predictor variables in the various data frames to get a sense of which predictors may vary with Occurrences. As I will be making a lot of plots of the same type I will create plotting functions to use in for loops that will speed up the creation of the plots.

scatterplot_fun = function(data, x, y, na.rm = TRUE){
  ggplot(data = data, aes(x = .data[[x]], y = .data[[y]])) + 
    geom_point() + 
    geom_smooth(method = "loess", se = FALSE, colour = "red") + 
    theme_classic()
}

boxplot_fun = function(data, x, y, na.rm = TRUE){
  ggplot(data = data, aes(x = .data[[x]], y = .data[[y]])) + 
    geom_boxplot() + 
    theme_classic()
}
boxplot_fun2 = function(data, x, y, z, na.rm = TRUE){
   ggplot(data = data, aes(x = .data[[x]]*.data[[z]], y = .data[[y]])) + 
    geom_boxplot() + 
    theme_classic()
}
coplot_fun = function(data, x, y, z, na.rm = TRUE){
  gg_coplot(data = data, x = .data[[x]], y = .data[[y]], faceting = .data[[z]], loess_family = "symmetric", size = 2) + 
    theme_classic()
}

I will start with the Australia scale data set Rabbit, where I will plot Occurences against Year and all the variables that come after this in numerical order.

for(i in Rabbit[, 9:26]){
  print(ggplot(Rabbit, aes(x = i, y = Occurences)) +
          geom_point()) + theme_classic()
  Sys.sleep(1)
}

The data are very bunched together when plotted against occurrence, I will log-transform Occurences and re-plot the graphs.

for(i in Rabbit[, 9:27]){
  print(ggplot(Rabbit, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}

The plots produced suggest that year, the precipitation variables and distanced to the edge of the nearest agricultural land may have country wide effects on rabbit occurrences with the other plots suggesting no likely relationships. Next, I will plot boxplots of log(Occurences) against the factor variables.

for(i in Rabbit[, 28:37]){
  boxplot(log(Occurences) ~ i, data = Rabbit)
  Sys.sleep(1)
}

The boxplots suggest there is likely a difference in the number of rabbit occurrences according to State, Vegetype, Season and Month. The other variables also likely have differences in rabbit occurences but I am sceptical of what they are showing. The boxplot for Diseases suggests that the number of rabbit occurrences is higher with the more introduced biological control diseases, however, what may be driving this is that the fewer the number of introduced diseases, the further back in time we are and the earlier in time we go, the sampling effort decreases. The presence/absence of potential predators and competitors seem to suggest that on the country scale, the number of rabbit occurences increases. This could be a result of predators actually predating competitors of the invasive European Rabbit and Wallaby species competing more intensely with other herbivores.

Now let’s look at some potential interactions between the variables. I will first use co-plots to look at interactions between the continuous and factor variables.

#for(i in Rabbit[, 9:27]){
#  for(j in Rabbit[, 28:37]){
#    coplot(Occurences ~ i | j, rows = 1, data = Rabbit)
#    Sys.sleep(1)
#  }
#}

This nested for loop takes a long time to run, whilst I search for a solution I will do the scatter and box plots for the other scales first and then come back to the co-plots.

I will move on to the State/Territory-scale level with the first State/Territory I will look at being the Australian Capital Territory (ACT). As the natural log scale was used for the Country-scale I will use it here and for all further graphing.

#Scatter Plots
for(i in Rabbit_ACT[, 9:27]){
  print(ggplot(Rabbit_ACT, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}

## Warning: Removed 4887 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

## Warning: Removed 1043 rows containing missing values (geom_point).

#Boxplots
for(i in Rabbit_ACT[, 28:37]){
  boxplot(log(Occurences) ~ i, data = Rabbit_ACT)
  Sys.sleep(1)
}

The possible trends found in the Australia-Scale analysis was replicated for the ACT with the precipitation variables, year, distance to nearest argricultural land edge and distance to nearest permanent water feature may vary with rabbit occurences in ACT. The same possible trends in the factors at the Australia-scale were seen at the ACT scale.

Next, I will look at the New South Wales data set.

#Scatter Plots
for(i in Rabbit_NSW[, 9:27]){
  print(ggplot(Rabbit_NSW, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}

## Warning: Removed 1825 rows containing missing values (geom_point).

#Boxplots
for(i in Rabbit_NSW[, 28:37]){
  boxplot(log(Occurences) ~ i, data = Rabbit_NSW)
  Sys.sleep(1)
}

The possible trends found in the Australia-Scale analysis was replicated for the NSW with the precipitation variables, year, distance to nearest argricultural land edge and distance to nearest permanent water feature may vary with rabbit occurences in NSW. The same possible trends in the factors at the Australia-scale were seen at the ACT scale except for diseases where no trend is likely and there are no whistling kite absences (a product of the method used to generate them).

Now I will move onto the Northern Territory data set.

#Scatter Plots
for(i in Rabbit_NT[, 9:27]){
  print(ggplot(Rabbit_NT, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}

## Warning: Removed 2964 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

#Boxplots
for(i in Rabbit_NT[, 28:37]){
  boxplot(log(Occurences) ~ i, data = Rabbit_NT)
  Sys.sleep(1)
}

In the Northern territory the percipitation, temperature, land use and day length variables may all vary with the number of rabbit occurences. The season effect seen on the Australia-scale may not exist in NT and there were again no wistling kite absences but all other factor trends seen in other data sets appear similar in NT.

Next, I repeated the analysis for Queensland (QLD).

#Scatter Plots
for(i in Rabbit_QLD[, 9:27]){
  print(ggplot(Rabbit_QLD, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}

## Warning: Removed 31287 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing missing values (geom_point).

#Boxplots
for(i in Rabbit_QLD[, 28:33]){
  boxplot(log(Occurences) ~ i, data = Rabbit_QLD)
  Sys.sleep(1)
}

The plots for Queensland suggest that the trends suggested in the Australia-wide data except for no trend in year, likely due to the relatively greater sampling effert in QLD compared to the other states, and it was not possible to make box plots for the animal species except for red foxes and dingoes.

Now I will move on to South Australia. There are some issues with this data to note, only VegeType can be plotted as a box plot as Diseases, Season and Month only have one level and there is no data on any of the animal species.

#Scatter Plots
for(i in Rabbit_SA[, 9:27]){
  print(ggplot(Rabbit_SA, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}

#Boxplots
boxplot(log(Occurences) ~ VegeType, data = Rabbit_SA)

There are plotting issues that I need to resolve latter but are due in part to there being no data for some variables and quality issues with what data is there as the occurences are all the same value.

Next, I move on to the Tasmania data set. In the TAS data set there is no data on the animal species, no variability in the number of occurences and only one factor level for all factors with data except for VegeType

For the Victoria data set there is again no data for the animal species and Diseases has only one factor level but there is variability in occurences and so the data is plottable.

#Scatter Plots
for(i in Rabbit_VIC[, 9:27]){
  print(ggplot(Rabbit_VIC, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}

## Warning: Removed 12515 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 19867 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 8 rows containing missing values (geom_point).

#Boxplots
for(i in Rabbit_VIC[, 28:31]){
  boxplot(log(Occurences) ~ i, data = Rabbit_VIC)
  Sys.sleep(1)
}

For the variables that could be plotted there are trends in all variables as was the case in NT.

For the Wesern Australia data set there is again no data for the animal species, precipitation variables, temperature variables and Diseases has only one factor level but there is variability in occurences and so the data is plottable.

#Scatter Plots
for(i in Rabbit_WA[, 9:27]){
  print(ggplot(Rabbit_WA, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}
## Warning: Removed 16774 rows containing missing values (geom_point).

## Warning: Removed 16770 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

## Warning: Removed 18385 rows containing missing values (geom_point).

#Boxplots
for(i in Rabbit_WA[, 28:31]){
  boxplot(log(Occurences) ~ i, data = Rabbit_WA)
  Sys.sleep(1)
}

There were trends with occurence for all the variables that had data, but the trends were quite different than the trends seen in other states/territories.

Now I will finally move on to the transect-scale data sets where the same exploratory data analysis of scatter, box and coplots will be done for each data set. I will start with the North-South transect data.

#Scatter Plots
for(i in NS[, 9:27]){
  print(ggplot(NS, aes(x = i, y = log(Occurences))) +
          geom_point() + theme_classic())
  Sys.sleep(1)
}

## Warning: Removed 7507 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

## Warning: Removed 7701 rows containing missing values (geom_point).

#Boxplots
for(i in NS[, 28:38]){
  boxplot(log(Occurences) ~ i, data = NS)
  Sys.sleep(1)
}

In the North-South data there are possible trends in all of the continuous variables except year, day, distance to the edge of the nearest agricultural land and percentage clay in the soil. Something to note is that these possible trends appear more non-linear at this scale than the Country and State/Territory scales. There were no likely trends between rabbit occurences and the factor variables except for vegetation type (not all 13 types were present), season, month and red foxes. There was a possible effect of transect, something to note when it comes to modelling the data.

The EW data set has no data on continuous variables apart from abundance and the factor levels only have 1 level or no levels at all making this data set unusable.